Mutations of SARS-CoV-2 Structural Proteins in the Alpha, Beta, Gamma, and Delta Variants: Bioinformatics Analysis

Background COVID-19 and Middle East Respiratory Syndrome are two pandemic respiratory diseases caused by coronavirus species. The novel disease COVID-19 caused by SARS-CoV-2 was first reported in Wuhan, Hubei Province, China, in December 2019, and became a pandemic within 2-3 months, affecting social and economic platforms worldwide. Despite the rapid development of vaccines, there have been obstacles to their distribution, including a lack of fundamental resources, poor immunization, and manual vaccine replication. Several variants of the original Wuhan strain have emerged in the last 3 years, which can pose a further challenge for control and vaccine development. Objective The aim of this study was to comprehensively analyze mutations in SARS-CoV-2 variants of concern (VoCs) using a bioinformatics approach toward identifying novel mutations that may be helpful in developing new vaccines by targeting these sites. Methods Reference sequences of the SARS-CoV-2 spike (YP_009724390) and nucleocapsid (YP_009724397) proteins were compared to retrieved sequences of isolates of four VoCs from 14 countries for mutational and evolutionary analyses. Multiple sequence alignment was performed and phylogenetic trees were constructed by the neighbor-joining method with 1000 bootstrap replicates using MEGA (version 6). Mutations in amino acid sequences were analyzed using the MultAlin online tool (version 5.4.1). Results Among the four VoCs, a total of 143 nonsynonymous mutations and 8 deletions were identified in the spike and nucleocapsid proteins. Multiple sequence alignment and amino acid substitution analysis revealed new mutations, including G72W, M2101I, L139F, 209-211 deletion, G212S, P199L, P67S, I292T, and substitutions with unknown amino acid replacement, reported in Egypt (MW533289), the United Kingdom (MT906649), and other regions. The variants B.1.1.7 (Alpha variant) and B.1.617.2 (Delta variant), characterized by higher transmissibility and lethality, harbored the amino acid substitutions D614G, R203K, and G204R with higher prevalence rates in most sequences. Phylogenetic analysis among the novel SARS-CoV-2 variant proteins and some previously reported β-coronavirus proteins indicated that either the evolutionary clade was weakly supported or not supported at all by the β-coronavirus species. Conclusions This study could contribute toward gaining a better understanding of the basic nature of SARS-CoV-2 and its four major variants. The numerous novel mutations detected could also provide a better understanding of VoCs and help in identifying suitable mutations for vaccine targets. Moreover, these data offer evidence for new types of mutations in VoCs, which will provide insight into the epidemiology of SARS-CoV-2.


Introduction
The emergence of SARS-CoV-2 during the early months of 2020 made headlines worldwide. Since then, several new variants of SARS-CoV-2 have emerged and are classified based on their ability to cause a threat to public health in two groups: variants of concern (VoCs) and variants of interest (VoIs) [1]. VoIs are defined as variants with specific genetic markers causing mutations that facilitate virus transmissibility, reduce the accuracy of diagnostic results, and reduce antibody neutralization acquired through natural infection or vaccination [2]. VoCs are associated with the level of virus transmissibility, infection, reduced effectiveness of vaccines and treatment, failure in virus detection, and reduced levels of neutralizing antibodies generated during previous vaccination or infection. The main SARS-CoV-2 VoCs that emerged include the Alpha (B.1.1.7), Beta (B.1.351), Gamma (P.1), Delta (B.1.617.2), and the most recent Omicron variants [2] The first reported VoC, B.1.1.7 (Alpha variant), was isolated in the United Kingdom in December of 2020 [3,4], which contained a total of 23 mutations [5]. These mutations directly affect the open reading frame (ORF)1ab and ORF8 regions of the spike (S) protein as well as the nucleocapsid (N) protein [6]. The Alpha variant was characterized by substantially higher levels of infectivity and transmissibility. A total of seven mutations were reported in the S protein of this variant, including N501Y, A570D, D614G, P681H, T716I, S982A, and D1118H, along with two deletions (Δ69-70 and Δ145) [7]. In addition, several mutations were identified in the N protein sequence of the Alpha variant, including D3L, P13L, D103Y, S197L, S188L, S93I, I292T, R203K, G204R, S190I, S194L, S202N, S235F, D348H, and D401Y [8].
In addition to the S and N proteins, all of these reported VoCs comprise the other main structural proteins of SARS-CoV-2, membrane protein (M) and envelope protein (E). Each protein of the virus plays a vital role and also takes part in the replication cycle. The S protein is required for the attachment and amalgamation of the virus to host cell surface receptors, which enables the virus to enter the host cell [19]. The main function of the N protein is to form the nucleocapsid by binding to the RNA genome of the virus, which has unique properties compared to the other proteins. The shape of the virus envelope is developed with the help of M protein, which is present in abundance inside the virus and also facilitates interactions with other viral proteins as well as in organizing the assembly of proteins. E protein is the smallest SARS-CoV-2 protein, whose function remains somewhat mysterious. During replication, E protein is abundantly expressed in the host cell, whereas only a small portion of the protein is incorporated into the virus envelope [20]. Almost all of the major structural proteins possess mutations at the receptor-binding domain (RBD) and N-terminal domain sites [21], and they have one mutation in common (N501Y) except for the Delta variant [21].
Recent studies have shown that several mutations are responsible for the spread and lethality of SARS-CoV-2, with more than 10 SARS-CoV-2 variants reported to date that are categorized as either VoCs or VoIs [3]. However, it remains unclear how these sequences are mutating during transfer of the virus from person to person. To answer this question, a total of 127 full-length amino acid sequences of SARS-CoV-2 isolates from 14 countries submitted to NCBI up to July 15, 2021, were retrieved to investigate and identify amino acid substitutions in SARS-CoV-2 lineages and their mutational pattern in major structural proteins.
In this study, we used bioinformatics methods to identify nonsynonymous mutations in the S and N proteins of the four main VoCs of SARS-CoV-2 and to determine how they affect the structure and functional dynamics of the virus. This analysis will help to better understand the epidemiology of SARS-CoV-2 and its emerging VoCs, which might ultimately identify suitable mutations as new vaccine targets.

Data Source
On the basis of a high predominance rate, the data were collected from isolates reported in 14 countries (Pakistan, Turkey, China, Iran, Morocco, United States, United Kingdom, France, Italy, Spain, India, Japan, Egypt, and Russia). A total of 127 nucleotide sequences of SARS-CoV-2 were retrieved from the National Center for Biotechnology Information (NCBI) Virus SARS-CoV-2 Data Hub [22] along with their major structural proteins (Table 1).
SARS-CoV-2 sequences for the four respective proteins (S, N, M, and E) were downloaded in FASTA format. Reference sequences were also considered for data comparison. A total of 127 amino acid sequences were obtained for analysis, which were converted into nucleotides using the online reverse translation tool Sequence Manipulation Suite.

Quality Profiling for Sequence Selection and Phylogenetic Analysis
Quality profiling for sequence selection was performed to differentiate between countries according to the epidemic record. This study included four types of SARS-CoV-2 variants taking into account their S and N proteins. Data were compared by constructing phylogenetic trees for each protein. All other types of SARS-CoV-2 variants and their respective proteins were excluded from the analysis.
Multiple sequence alignment was performed and phylogenetic trees were constructed by the neighbor-joining method with 1000 bootstrap replicates using MEGA (version 6) [23]. The FASTA file was computed with a gap-opening penalty of 15 and gap-extension penalty of 6.66, maintaining a delay divergent cutoff of 30%. Amino acid substitutions that were unique to SARS-CoV-2 were identified by visual inspection of the alignments.

Mutation Identification With MultAlin
For the detection of widespread nonsynonymous mutations in the S, N, M, and E proteins, amino acid sequences were analyzed using MultAlin (version 5.4.1) [20] and each mutation was recorded separately.
This tool enabled identifying the exact location of the mutation in the genome sequence of each strain by providing the position of the mutated site.

Ethics Considerations
This study was based on analysis of secondary data that are publicly available at NCBI [22] and did not require any ethical approval.

Mutation Hotspots in the S Protein of SARS-CoV-2 Variants
The main VoCs of SARS-CoV-2 all contain the four major structural proteins S, N, M, and E, and numerous studies have elucidated similarities and differences among the viral genomes and their proteins using different types of bioinformatics tools [23]. Among the isolates of the 14 countries considered in this study, strong evidence was found for occurrence of the D614G mutation (see Multimedia Appendix 1), indicating replacement of the amino acid aspartic acid (D) with glycine (G) at position 614 in the sequence. The D614G mutation affects the interaction with the host receptor angiotensin-converting enzyme 2 (ACE2), resulting in greater stability and the ability to transmit more efficiently, although binding of the mutant was not as competent as compared to the normal binding of the viral protein [24]. The majority of the Pakistan isolates (MW421982-92) also carried the D614G mutation along with some unreported additional mutations, including P26L, D80Y, S813N, Q1207H, D1163Y, and T1117I (see Multimedia Appendix 1). Two of the Egyptian isolates (MW533286, MW533289) displayed unique mutations (Q23X, S12X, Q677X, and P681X), where the amino acids glutamine (Q), serine (S), and proline (P) were replaced with an unknown amino acid (X) at different positions. These mutations might be important for the future study of the mechanism underlying virus lethality. A much higher rate of mutations along with D614G was observed in the UK isolate MT906649, with a series of novel mutations (T22X, P25X, G142X, Y144-5X, S735X, K1191X) identified in which the existing amino acids tyrosine (Y), threonine (T), proline (P), lysine (K), and glycine (G) were substituted to result in a change of the conformation of S protein (see Multimedia Appendices 1 and 2).
Another mutation of concern identified in S protein was A570D, in which alanine (A) was replaced by aspartic acid (D) at position 570, which was found to co-occur with a Δ145 deletion and three other mutations: T716I, where tryptophan (T) was replaced by isoleucine (I) at position 716; S982A, where serine (S) was replaced by alanine (A) at position 982; and D1118H, where aspartic acid (D) was replaced by histidine (H) at position 1118 (Multimedia Appendix 1). The mutations A570D, T716I, S982A, and D1118H were a result of a series of accumulated mutations, which collectively increased the lethality and transmissibility of the virus [25]. These mutations were observed in isolates from the United States, India, Italy, and Spain simultaneously; nevertheless, the US isolates (MW725912,  MW725900,  MW725904,  MW725907,  MW712865, MW712862, MW712864, MW725917, and MW725924) also harbored the mutation G72W, in which glycine (G) was replaced by tryptophan (W) at position 72W, along with D614G although the effect of this mutation remains unknown. The co-occurrence of the mutation A570D with D614G and S982A was also observed in some isolates of Italy (MW491232, MW711159), the United States (MZ311101 and MW725906), and India (MW600456), with no other novel mutations identified in these cases (Multimedia Appendices 1 and 2). The mutations A570D, D614G, and S982A correspondingly help in minimizing contact between the individual trimeric spike promoter chains, thereby promoting increased cleavage between the S1 and S2 domains of S protein to consequently enhance the host fusion capability while rearranging the overall dynamic structure of the virus [26].
The third most prominent mutation identified was N501Y, in which asparagine (N) was replaced by tyrosine (Y) at position 501 of the S protein (Multimedia Appendix 1). The transmissibility of the virus harboring the N501Y mutation (located at the receptor-binding motif) increased by 70%-80% and this mutation also improved the binding affinity of the virus onto host cells [27]. This mutation in combination with 7 other mutations (A570D, P681H, T716I, S982A, D1118H, and Δ69-70, Δ145) were termed to be "mutations of major concern" [27,28] and were consistently detected in isolates from the United States, India, Italy, and Spain. The deletion of histidine at position 69 (Δ69) and valine at position 70 (Δ70) also evolved in other variants (Multimedia Appendix 1) and are considered to be responsible for increasing the transmissibility as well as infectivity of the virus, along with causing S gene target failure, resulting in nondetection of the virus [7,29]. Another deletion of tyrosine at position 144 (Δ144) was considered to be responsible for changing the conformation of the S protein's surface, thereby facilitating evasion of host immunity and increasing infection [30]. Apart from these mutations, deletions at position 85-89 (Δ85-Δ89) in a Spanish isolate (MW715071) along with other unique mutations of S protein, such as V90T (in which valine is replaced by threonine at position 90), A93Y (in which alanine is replaced by tyrosine at position 93), and D138H (in which aspartic acid is replaced by histidine at position 138), were also observed (Multimedia Appendices 1 and 2). Although the specific function of these mutations remains unknown, their identification and further analysis may help to better understand virus structure and lethality.
Additionally, the trio mutations A220V (alanine replaced by valine at position 220), ORF10 V30L, and Spike A222V, were identified in the S and N proteins of Spanish isolates (MW715068-MW715080). These mutations formed different types of clades when combined with other mutations [31], although the A220V mutation was identified with no additional mutations from the reported data. Furthermore, some of the main mutations included in South African variants were L18F (leucine replaced by phenylalanine at position 18), D80A (aspartic acid replaced by alanine at position 80), D215G (aspartic acid replaced by glycine at position 215), R246I (arginine replaced by isoleucine at position 246), K417N (lysine replaced by asparagine at position 417), and E484K (glutamic acid replaced by lysine at position 484), along with N501Y, D614G, and A222V. The mutations K417N, E484K, and N501Y located in the RBD help the virus in binding to the ACE2 receptors of host cells [9,10] A recent study also reported that the E484K mutation might alter the conformation of S protein, thereby affecting the neutralizing capability of the antibody response in host cells, as cases of reinfection were also increased in patients with isolates harboring the E48K mutation at the peak (ie, the majority of the isolates possessed the E484K mutation) during mid-2021 [27]. Several studies have also reported the E484K mutation as a major cause of decreased effectiveness of current vaccines [27,32]. The mutations N501Y and E484K along with L18F and K417T/N are considered to decrease ACE2 binding affinity [33] and were reported in isolates from Italy (MW642250 and MW642248) and the United States (MZ320527). Some of the mutations of the Alpha and Gamma variants, such as N501Y, D614G, E484K, A701V, and N501Y, were also observed in isolates from Italy and the United States (Multimedia Appendices 1 and 2).
As the variants continued to spread across different regions, another VoC emerged in Spain toward the end of 2020. This variant possessed an exceptional mutation, A222V (alanine replaced by valine at position 222), in the S protein (Multimedia Appendix 1). The mutation A222V alone had no direct impact on transmissibility of the virus, in contrast to the effect of D614G [34]; however, in combination with other Beta variant mutations such as L18F, D80A, K417N, E484K, N501Y, A701V, D215G, and deletions at position 242-244 (Δ242-Δ244), A222V causes a severe hindrance in antibody binding [29]. We found these mutations combined with D614G in Spanish isolates (MW715072 and MW715075). A new type of deletion (Δ139-Δ144) was also observed in two isolates from Spain (MW715068 and MW715078) along with the L18F, A222V, and D614G mutations (Multimedia Appendices 1 and 2).
In addition to the Alpha and Beta variants, another VoC was the Brazil variant, which consists of mutations almost identical to S protein mutations of the Beta variant (N501Y, E484K, A701Y) except for the K417T mutation, where lysine (K) was replaced by threonine (T) at position 417, also causing a decrease in ACE2 binding affinity [33]. The dominance of these mutations in many VoCs that play an important role in ACE2 binding affinity during viral attachment [33] might also increase the chances of reinfection [35]. These mutations occurred in isolates from Italy (MW642250, MW642248, MW711159, and MW491232), the United States (MZ320527), and France (MW580244) (Multimedia Appendices 1 and 3).
As compared to other VoCs, the Delta variant was of major concern, which consists of four types of signature mutations: L452R, T478K, D614G, and P681R. The P681R mutation, in which proline (P) was replaced by arginine (R) at position 681, increased the rate of the cleavage process in S1 and S2 subunits (at the furin cleavage site), facilitating virus transmissibility [33,36]. A famous virologist at Cornell University in New York stated that "This little insert (P681R) sticks out and hits you in the face" [36]. The P681R mutation was considered to be responsible for the rapid spread of SARS-CoV-2 around the globe [36]. These signature mutations (L452R, T478K, D614G) were observed in isolates from Egypt (MW533290), India (MZ310590 and MZ310591), and Spain (MW715070) (Multimedia Appendix 4). L452R is the only S protein mutation that clasps the virus with the host cell surface, facilitating injection of the viral genetic material into host cells [4]. The L452R mutation was identified in isolates from the United States (MW725963) and Spain (MW715074) along with D614G, covering more than 90% of variants that emerged since 2020, conferring the virus with increased replication and infectivity abilities [24,36] (Multimedia Appendices 1 and 4).
These data demonstrated that the UK variant 20I/N501Y.V1 derived from lineage B.1.1.7 and the Brazil variant 20J/501Y.V2 derived from lineage B.1.351 (termed P.1) consisted of several mutations at specific points of the nucleic acid sequence, causing several physical changes as well as functional changes affecting virus lethality.

Mutation Hotspots in N protein of SARS-CoV-2 Variants
Among the other structural proteins of SARS-CoV-2, N protein, which is known to be more stable and conserved than other proteins, consists of three domains: the N-terminal domain, serine/arginine-rich linker region, and C-terminal domain [37]. The function of N protein is to make the nucleocapsid for the virus by binding to its RNA genome [38]. Mutually, R203K-G204R mutations were observed in the serine/arginine-rich linker region (responsible for cellular processes such as the cell cycle and characterized by high flexibility) of N protein, which also affect virus assembly [11,39]. Additionally, R203K-G204R mutations belong to the Alpha (along with D3L and S235F) and Gamma variants of SARS-CoV-2 (also expressed as G204R/X) [39,40]. The function of the G204X mutation can be considered ambiguous at present, because the specific amino acid replacing glycine at position 204 is unknown. Additionally, the amino acid substitutions R203K-G204R were identified in various isolates from 11 regions. The Egypt isolates (MW533286, MW533289) also possess the peculiar mutations G212X and G25X, in which glycine is replaced by an unknown residue at positions 25 and 212 (Multimedia Appendix 2), whereas a Pakistan isolate (MW422070) harbors the mutations R203K, G204R, and D614G of the Alpha variant along with an additional mutation A152X, where alanine (A) is replaced by an unknown amino acid residue (X) at position 152 (Multimedia Appendix 3). The mutations R203K, G204R, and D614G also increase viral infectivity due to a higher replication rate; thus, the presence of the dual mutation R203K/G204R in N protein along with the D614G and N501Y mutations of S protein result in an overall increase in the severity of disease and viral infectivity in the host body [14].
The R203K/G204R and N501Y mutations were also associated with disease severity, infectivity of the virus, and an increase in the mortality rate of host cells [41,42]. The combinations of R203K/G204R and N501Y along with the P80R, K417T, and E484K mutations were observed in isolates from Italy (MW642250, MW642248), the United States (MZ320527), and France (MW580244) (Multimedia Appendices 3 and 5). Conversely, the Delta variant possesses the R203M, G204R, and D377Y mutations that might cause a functional disruption in viral efficiency [14]. The trio mutations R203M, G204R, and D377Y were only observed in isolates of India (MZ702716, MZ310590, MZ310591) (Multimedia Appendices 4 and 5).
Furthermore, one of the mutations of interest in N protein was S194L, which is in a region responsible for protein oligomerization [43] (formation of hetero oligomers), and these hetero oligomers form an N-M protein complex that is critical for virus assembly [43,44]. The mutation S194L was identified with no other co-occurring mutations in isolates from India (MZ310512, MW600461-63), the United States (MW725958), and Iran (MT889692) (Multimedia Appendices 2 and 5). The S194L mutation was also identified during the SARS outbreak in 2003 [39]. In addition, another mutation, T205I, was frequently identified in the majority of the global variants evaluated, including isolates from Spain (MW715082, MW715069), France (MW580244), the United States (MW725963), and India (MW595912, MW595915, MW595914, MZ310507) (Multimedia Appendix 6).

Mutation Hotspots in M and E Proteins of SARS-CoV-2 Variants
M protein interacts with the S and E proteins to establish the traditional shape of the virus envelope, and also helps in connecting as well as organizing other proteins of the virus [45]. We identified only five mutations in M protein in our sequence analysis: V70L (valine replaced by leucine at position 70), F28X (phenylalanine replaced by an unknown amino acid at position 28), E12X (glutamic acid replaced by an unknown amino acid at position 12), I82T (isoleucine replaced by threonine at position 82), and deletion at position 72 (Δ72) (Multimedia Appendices 7 and 8). The Δ72 deletion was observed in an isolate from Spain (MW375731), which also contains the S protein mutation D614G (Multimedia Appendix 1). The E12X and F28X mutations were observed in a UK isolate (MT906649), which also possesses the mutation D614G of S protein and the T30I and L51X mutations of E protein (Multimedia Appendix 7). The I82T mutation was present in an Indian isolate (MZ702716) that also harbored the T182I mutation of E protein (Multimedia Appendix 7); L452R, T478K, D614G, P681R mutations of S protein (Multimedia Appendix 1); and the N protein mutations R203M and D377Y from the Delta variant (Multimedia Appendix 5). The last mutation V70L was observed in an isolate from Egypt (MW533290), which stands out from all other sequences because it consists of top controversial mutations (as these mutations were present in almost every variant of SARS-CoV-2) from the S protein of the Alpha (D614G) and Delta (P681R) variants, as well as N protein mutations from the Gamma variant (R203K, G204X) (Multimedia Appendices 5, 7 and 8).
E protein of SARS-CoV-2 plays a significant role in the assembly, pathogenesis, envelope formation, and budding of the virus [7]. As the smallest of the major structural proteins, the expression of E protein is abundant inside the host cell, but only a small portion of this protein is incorporated into the virus envelope [46]. We identified five mutations in E protein: L28P (leucine replaced by proline at position 28), T30I (threonine replaced by isoleucine at position 30), L51X (leucine replaced by unknown amino acid at position 51), V58F (valine replaced by phenylalanine at position 58), and P71L (proline replaced by leucine at position 71) (Multimedia Appendices 7 and 8). The mutation V58F was present in an isolate from India (MW595915), in addition to the D614G mutation of S protein (Multimedia Appendix 1) and T205I mutation of N protein (Multimedia Appendix 5). The L28P mutation was observed only in an Iran isolate (MT994881) with no other major mutations present. The third mutation, P71L, was present in US isolates (MW725914 and MW725923), along with the D614G, R203K, and G204R mutations. The mutation P71L was also observed in an isolate from France (MW580244), along with the N501Y and E484K mutations and the A701V mutation from S protein of the Gamma variant. In contrast, mutations T30I and L51X were observed in a UK isolate (MT906649) along with D614G (from S protein), E12X, and F28X (from M protein) (Multimedia Appendices 1, 5, 7 and 8).
According to the predicted functions of these major mutations, it was concluded that four mutations from M protein and five mutations from E protein of SARS-CoV-2 variants along with other mutations of S and N proteins might increase the transmissibility, susceptibility, and lethality of the virus [8]. Additionally, analysis of the mutational patterns showed that the SARS-CoV-2 variants displayed unique mutations in isolates from different countries (Multimedia Appendix 9).

Phylogenetic Analysis
Along with an overall visual investigation of the relevant mutations, phylogenetic analysis was performed to analyze the evolutionary relationships among different strains of SARS-CoV-2.
Analysis of the nodes of the tree constructed with S protein sequences showed that hCoV-NL63 (YP_003767) and hCoV-229E (NP_073551) displayed a strong association (100%), while SARS-CoV (NC_004718) and MERS-CoV (YP_009047204) exhibited strongly associated clades (76%). Moreover, the reference sequence of S protein (YP_009724392) presented a weak association (68%) and was distantly related to S protein sequences of other β-coronaviruses. In addition, the majority of the SARS-CoV-2 variants displayed no support to the reference sequence clades with some being only distantly related. Therefore, the level of observed clades in each strain differed, providing a set of contradictory nodes during cladogram comparison among S protein variants ( Figure 1).
The cladogram of N protein showed a different pattern than that constructed for S protein. All four β-coronavirus sequences of N protein exhibited strong associations among each other (100%), but there was no support for an association to the reference sequence of N protein (YP_009724393). The clades of India (QQY49667, QQY679) and the United States (QSU75744, QSU75637) were well-associated (79%-84%). The repeatability of bootstrap values below 50% was high, whereas few clades possessed weak or strong associations. Overall, no evolutionary relationship was observed among the clades of the reference sequence and retrieved nucleotide sequences ( Figure 2).
The clades of M and E proteins of the examined isolates along with their reference sequences (YP_009724397 and YP_009724390) showed no association with the β-coronavirus species, whereas the β-coronavirus species displayed strong associations among themselves (100%) (Multimedia Appendix 10). Overall, the neighbor-joining trees for the four major proteins indicated total divergence among β-coronavirus species and retrieved sequences of SARS-CoV-2, and there was only weak or no support between the SARS-CoV-2 clades. The length of the branches of the neighbor-joining tree represents the genetic distance between species (Figures 1-2, Multimedia Appendix 10). Moreover, all the alternative and noncontradictory nodes as well as the repeatability of bootstrap values were rejected in this analysis.

Conclusion
The world has witnessed a global pandemic during the 21st century and the majority of nations have contributed to the development of vaccines. Nevertheless, there have been obstacles in the distribution of the vaccines, including a lack of fundamental resources, poor immunization, and manual vaccine replication. Overall, this study can offer a better understanding of the main VoCs of SARS-CoV-2. Several new mutations were detected in this study (see Multimedia Appendix 11), which may contribute to gaining a better understanding of the VoCs as well as in identifying suitable mutations for vaccine targets. These data can further provide evidence for new types of mutations in VoCs, which will help in gaining a better understanding of the epidemiology of SARS-CoV-2 and its dynamic mutational patterns.

Acknowledgments
We appreciate the National Center for Biotechnology Information online portal for providing free access to full-length genomes of SARS-CoV-2 variants. We also gratefully acknowledge the various originating and submitting laboratories for providing the full viral genome sequences and the metadata that were included in this study. We extend our appreciation to the various software developer programmers, including the developers of MEGA 6, coreldraw12, and MultAlin portal. The authors received no specific funding to support this work.

Data Availability
The data sets generated during this study are available in the National Center for Biotechnology Information SARS-COV-2 resources repository [22].